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Abstract 

In the Network Inference problem, one seeks to 
recover the edges of an unknown graph from the 
observations of cascades propagating over this 
graph. In this paper, we approach this prob¬ 
lem from the sparse recovery perspective. We 
introduce a general model of cascades, includ¬ 
ing the voter model and the independent cascade 
model, for which we provide the hrst algorithm 
which recovers the graph’s edges with high prob¬ 
ability and C>(slogm) measurements where s is 
the maximum degree of the graph and m is the 
number of nodes. Furthermore, we show that 
our algorithm also recovers the edge weights (the 
parameters of the diffusion process) and is ro¬ 
bust in the context of approximate sparsity. Fi¬ 
nally we prove an almost matching lower bound 
of U(s log and validate our approach empiri¬ 
cally on synthetic graphs. 


1. Introduction 

Graphs have been extensively studied for their propaga¬ 
tive abilities: connectivity, routing, gossip algorithms, etc. 
A diffusion process taking place over a graph provides 
valuable information about the presence and weights of its 
edges. Influence cascades are a specihc type of diffusion 
processes in which a particular infectious behavior spreads 
over the nodes of the graph. By only observing the “in¬ 
fection times” of the nodes in the graph, one might hope 
to recover the underlying graph and the parameters of the 
cascade model. This problem is known in the literature as 
the Network Inference problem. 

More precisely, solving the Network Inference problem 
involves designing an algorithm taking as input a set of 
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observed cascades (realisations of the diffusion process) 
and recovers with high probability a large fraction of the 
graph’s edges. The goal is then to understand the relation¬ 
ship between the number of observations, the probability 
of success, and the accuracy of the reconstruction. 

The Network Inference problem can be decomposed and 
analyzed “node-by-node”. Thus, we will focus on a sin¬ 
gle node of degree s and discuss how to identify its par¬ 
ents among the m nodes of the graph. Prior work has 
shown that the required number of observed cascades is 
O{poly(s) log m) (Netrapalli & Sanghavi, 2012; Abrahao 
et ah, 2013). 

A more recent line of research (Daneshmand et al., 2014) 
has focused on applying advances in sparse recovery to the 
network inference problem. Indeed, the graph can be in¬ 
terpreted as a “sparse signal” measured through influence 
cascades and then recovered. The challenge is that influ¬ 
ence cascade models typically lead to non-linear inverse 
problems and the measurements (the state of the nodes at 
different time steps) are usually correlated. The sparse re¬ 
covery literature suggests that r2(slog—) cascade obser¬ 
vations should be sufficient to recover the graph (Donoho, 
2006; Candes & Tao, 2006). However, the best known up¬ 
per bound to this day is 0{s^ log to) (Netrapalli & Sang¬ 
havi, 2012; Daneshmand et al., 2014) 

The contributions of this paper are the following: 

• we formulate the Graph Inference problem in the con¬ 
text of discrete-time influence cascades as a sparse re¬ 
covery problem for a specific type of Generalized Lin¬ 
ear Model. This formulation notably encompasses the 
well-studied Independent Cascade Model and Voter 
Model. 

• we give an algorithm which recovers the graph’s edges 
using 0{s\ogm) cascades. Furthermore, we show 
that our algorithm is also able to efficiently recover the 
edge weights (the parameters of the influence model) 
up to an additive error term, 

• we show that our algorithm is robust in cases where 
the signal to recover is approximately s-sparse by 
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proving guarantees in the stable recovery setting. 

• we provide an almost tight lower bound of n(s log —) 
observations required for sparse recovery. 

The organization of the paper is as follows: we conclude 
the introduction by a survey of the related work. In Sec¬ 
tion 2 we present our model of Generalized Linear Cas¬ 
cades and the associated sparse recovery formulation. Its 
theoretical guarantees are presented for various recovery 
settings in Section 3. The lower bound is presented in Sec¬ 
tion 4. Finally, we conclude with experiments in Section 5. 

Related Work The study of edge prediction in graphs 
has been an active field of research for over a 
decade (Liben-Nowell &l Kleinberg, 2008; Leskovec et al., 
2007; Adar & Adamic, 2005). (Gomez Rodriguez et al., 
2010) introduced the Netinf algorithm, which approx¬ 
imates the likelihood of cascades represented as a con¬ 
tinuous process. The algorithm was improved in later 
work (Gomez-Rodriguez et al., 201 1), but is not known to 
have any theoretical guarantees beside empirical validation 
on synthetic networks. Netrapalli & Sanghavi (2012) stud¬ 
ied the discrete-time version of the independent cascade 
model and obtained the first 0{s^ log m) recovery guaran¬ 
tee on general networks. The algorithm is based on a like¬ 
lihood function similar to the one we propose, without the 
fi-norm penalty. Their analysis depends on a correlation 
decay assumption, which limits the number of new infec¬ 
tions at every step. In this setting, they show a lower bound 
of the number of cascades needed for support recovery with 
constant probability of the order n(s log(TO/s)). They also 
suggest a Greedy algorithm, which achieves a 0(s log m) 
guarantee in the case of tree graphs. The work of (Abra- 
hao et al., 2013) studies the same continuous-model frame¬ 
work as (Gomez Rodriguez et al., 2010) and obtains an 
log^ s log m) support recovery algorithm, without the 
correlation decay assumption. (Du et al., 2013) propose a 
similar algorithm to ours for recovering the weights of the 
graph under a continuous-time independent cascade model, 
without proving theoretical guarantees. 

Closest to this work is a recent paper by Daneshmand et al. 
(2014), wherein the authors consider a fi-regularized ob¬ 
jective function. They adapt standard results from sparse 
recovery to obtain a recovery bound of 0{s^ logm) under 
an irrepresentability condition (Zhao & Yu, 2006). Under 
stronger assumptions, they match the (Netrapalli & Sang¬ 
havi, 2012) bound of 0{s^ logm), by exploiting similar 
properties of the convex program’s KKT conditions. In 
contrast, our work studies discrete-time diffusion processes 
including the Independent Cascade model under weaker as¬ 
sumptions. Furthermore, we analyze both the recovery of 
the graph’s edges and the estimation of the model’s param¬ 
eters, and achieve close to optimal bounds. 


The work of (Du et al., 2014) is slightly orthogonal to ours 
since they suggest learning the influence function, rather 
than the parameters of the network directly. 

2. Model 

We consider a graph Q = (U, E, 0), where 0 is a \V\ x 
\V\ matrix of parameters describing the edge weights of Q. 
Intuitively, 0^ captures the “influence” of node i on node 
j. Let m = \V\. For each node j, let 9j be the column 
vector of 0. A discrete-time Cascade model is a Markov 
process over a finite state space {0,1,..., AT — 1}^ with 
the following properties: 

1. Conditioned on the previous time step, the transition 
events between two states in {0,1,..., AT — 1} for 
each i € V are mutually independent across i G V. 

2. Of the AT possible states, there exists a contagious 
state such that all transition probabilities of the 
Markov process can be expressed as a function of the 
graph parameters 0 and the set of “contagious nodes” 
at the previous time step. 

3. The initial probability over {0,l,...,Ar — l}'^ is 
such that all nodes can eventually reach a contagious 
state with non-zero probability. The “contagious” 
nodes at f = 0 are called source nodes. 

In other words, a cascade model describes a diffusion pro¬ 
cess where a set of contagious nodes “influence” other 
nodes in the graph to become contagious. An influence cas¬ 
cade is a realisation of this random process, i.e. the succes¬ 
sive states of the nodes in graph Q. Note that both the “sin¬ 
gle source” assumption made in (Daneshmand et al., 2014) 
and (Abrahao et al., 2013) as well as the “uniformly chosen 
source set” assumption made in (Netrapalli & Sanghavi, 

2012) verify condition 3. Also note that the multiple-source 
node assumption does not reduce to the single-source as¬ 
sumption, even under the assumption that cascades do not 
overlap. Imagining for example two cascades starting from 
two different nodes; since we do not observe which node 
propagated the contagion to which node, we cannot at¬ 
tribute an infected node to either cascade and treat the prob¬ 
lem as two independent cascades. 

In the context of Network Inference, (Netrapalli & Sang¬ 
havi, 2012) focus on the well-known discrete-time indepen¬ 
dent cascade model recalled below, which (Abrahao et al., 

2013) and (Daneshmand et al., 2014) generalize to contin¬ 
uous time. We extend the independent cascade model in 
a different direction by considering a more general class 
of transition probabilities while staying in the discrete-time 
setting. We observe that despite their obvious differences, 
both the independent cascade and the voter models make 
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the network inference problem similar to the standard gen¬ 
eralized linear model inference problem. In fact, we define 
a class of diffusion processes for which this is true: the 
Generalized Linear Cascade Models. The linear threshold 
model is a special case and is discussed in Section 6. 

2.1. Generalized Linear Cascade Models 

Let susceptible denote any state which can become conta¬ 
gious at the next time step with a non-zero probability. We 
draw inspiration from generalized linear models to intro¬ 
duce Generalized Linear Cascades: 

Definition 1. Let X* be the indicator variable of “conta¬ 
gious nodes” at time step t. A generalized linear cascade 
model is a cascade model such that for each susceptible 
node j in state s at time step t, the probability of j becom¬ 
ing “contagious” at time step t -\- 1 conditioned on X* is a 
Bernoulli variable of parameter f(9j ■ X*): 

P(X*+1 = 1\X^) = fiOj ■ X*) (1) 

where / : M —)■ [0,1] 

In other words, each generalized linear cascade pro¬ 
vides, for each node j G V a series of measurements 
(X*, sampled from a generalized linear model. 

Note also that | X^\ = f{9i ■ X*). As such, / can 

be interpreted as the inverse link function of our general¬ 
ized linear cascade model. 

2.2. Examples 

2.2.1. Independent Cascade Model 

In the independent cascade model, nodes can be either sus¬ 
ceptible, contagious or immune. At f = 0, all source nodes 
are “contagious” and all remaining nodes are “susceptible”. 
At each time step t, for each edge (i, j) where j is suscep¬ 
tible and i is contagious, i attempts to infect j with proba¬ 
bility pi j G [0,1]; the infection attempts are mutually in¬ 
dependent. If i succeeds, j will become contagious at time 
step Regardless of i’s success, node i will be immune 
at time f -f 1, such that nodes stay contagious for only one 
time step. The cascade process terminates when no conta¬ 
gious nodes remain. 

If we denote by the indicator variable of the set of con¬ 
tagious nodes at time step t, then if j is susceptible at time 
step f -f 1, we have: 

m 

P[X‘+i = l|X‘] =1-^(1- 
2=1 


Defining Qi j = log( ), this can be rewritten as: 

m 

P [X‘+^ = 11 = 1-11 (IC) 

2=1 

Therefore, the independent cascade model is a Generalized 
Linear Cascade model with inverse link function f : z t-G- 
1 — e~^. Note that to write the Independent Cascade Model 
as a Generalized Linear Cascade Model, we had to intro¬ 
duce the change of variable Qij = log( ). The re¬ 
covery results in Section 3 pertain to the Qj parameters. 
Fortunately, the following lemma shows that the recovery 
error on Qj is an upper bound on the error on the original 
Pj parameters. 

Lemma 1. ||0 - 0*112 > ||p - p*|| 2 - 

2.2.2. The Linear Voter Model 

In the Linear Voter Model, nodes can be either red or blue. 
Without loss of generality, we can suppose that the blue 
nodes are contagious. The parameters of the graph are nor¬ 
malized such that Vi, = 1- Each round, every 

node j independently chooses one of its neighbors with 
probability 0^ j and adopts their color. The cascades stops 
at a fixed horizon time T or if all nodes are of the same 
color. If we denote by X^ the indicator variable of the set 
of blue nodes at time step t, then we have: 

Ttl 

P [Xj+i = 1\X*] = ^ = 0j- • (V) 

i=l 

Thus, the linear voter model is a Generalized Linear Cas¬ 
cade model with inverse link function / : z i—)• z. 

2.2.3. Discretization of Continuous Model 

Another motivation for the Generalized Linear Cascade 
model is that it captures the time-discretized formula¬ 
tion of the well-studied continuous-time independent cas¬ 
cade model with exponential transmission function (CICE) 
of (Gomez Rodriguez et al., 2010; Abrahao et al., 2013; 
Daneshmand et al., 2014). Assume that the temporal reso¬ 
lution of the discretization is e, i.e. all nodes whose (con¬ 
tinuous) infection time is within the interval [ke, {k -\- 1)£) 
are considered infected at (discrete) time step k. Let X’^ 
be the indicator vector of the set of nodes ‘infected’ before 
or during the time interval. Note that contrary to the 
discrete-time independent cascade model, Xj = 1 
Xj^^ = 1, that is, there is no immune state and nodes 
remain contagious forever. 

Let Exp(p) be an exponentially-distributed random vari¬ 
able of parameter p and let Qij be the rate of transmis- 
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Figure 1. Illustration of the sparse-recovery approach. Our objec¬ 
tive is to recover the unknown weight vector 0j for each node j. 
We observe a Bernoulli realization whose parameters are given by 
applying / to the matrix-vector product, where the measurement 
matrix encodes which nodes are “contagious” at each time step. 

sion along directed edge (i,j) in the CICE model. By the 
memoryless property of the exponential, if Xj ^ 1: 

= P( min Exp(0„) < e) 

■' i&XU) 

m 

= P(Exp(^ < e) = 1 - 

i=l 

Therefore, the e-discretized CICE-induced process is a 
Generalized Linear Cascade model with inverse link func¬ 
tion / : z !->■ 1 — 

2.2.4. Logistic Cascades 

“Logistic cascades” is the specific case where the inverse 
link function is given by the logistic function f{z) = 
1/(1 -f Intuitively, this captures the idea that there 

is a threshold t such that when the sum of the parameters of 
the infected parents of a node is larger than the threshold, 
the probability of getting infected is close to one. This is 
a smooth approximation of the hard threshold rule of the 
Linear Threshold Model (Kempe et al., 2003). As we will 
see later in the analysis, for logistic cascades, the graph in¬ 
ference problem becomes a linear inverse problem. 

2.3. Maximum Likelihood Estimation 

Inferring the model parameter 0 from observed influence 
cascades is the central question of the present work. Recov¬ 
ering the edges in E from observed influence cascades is 
a well-identified problem known as the Network Inference 
problem. However, recovering the influence parameters is 
no less important. In this work we focus on recovering 0, 
noting that the set of edges E can then be recovered through 
the following equivalence: {i,j) G E Qij f 0 

Given observations (a;^,..., a;") of a cascade model, 
we can recover 0 via Maximum Likelihood Estimation 
(MLE). Denoting by £ the log-likelihood function, we con¬ 
sider the following ^i-regularized MLE problem: 

0 G argmax —£(0 I a:^,..., a;") — A||0||i 
e n 

where A is the regularization factor which helps prevent 


overfltting and controls the sparsity of the solution. 

The generalized linear cascade model is decomposable in 
the following sense: given Definition 1, the log-likelihood 
can be written as the sum of m terms, each term i G 
{1,... ,m} only depending on 9i. Since this is equally 
true for ||0||i, each column 9i of 0 can be estimated by 
a separate optimization program: 


9i G argmax£^(6', | , x"") - A||6»^||i (2) 

e 

where we denote by Ti the time steps at which node i is 
susceptible and: 

I ..., x") = ^ log/(6»i • x‘) 

' teTi 

+ (i-x‘+i)iog(i-m-x*)) 

In the case of the voter model, the measurements include all 
time steps until we reach the time horizon T or the graph 
coalesces to a single state. Lor the independent cascade 
model, the measurements include all time steps until node 
i becomes contagious, after which its behavior is determin¬ 
istic. Contrary to prior work, our results depend on the 
number of measurements and not the number of cascades. 

Regularity assumptions To solve program (2) effi¬ 
ciently, we would like it to be convex. A sufficient condi¬ 
tion is to assume that Ci is concave, which is the case if / 
and (1 — /) are both log-concave. Remember that a twice- 
differentiable function / is log-concave iff. /"/ < /'^. 
It is easy to verify this property for / and (1 — /) in the 
Independent Cascade Model and Voter Model. 

Lurthermore, the data-dependent bounds in Section 3.1 will 
require the following regularity assumption on the inverse 
link function /: there exists a G (0,1) such that 

max{|(log/)'(z^)|, |(log(l -/))'(z^)|} < - (LL) 

a 

for all Zx = 9* ■ X such that f(zx) 4- {Oj !}■ 

In the voter model, = - and Iw x = ■ Hence 

(LL) will hold as soon as a < 0^^ < 1 — a for all (i, j) G 
E which is always satisfied for some a for non-isolated 
nodes. In the Independent Cascade Model, 

and = 1. Hence (LL) holds as soon as p^ j > a for 

all (£ j) G E which is always satisfied for some a G (0,1). 

Lor the data-independent bound of Proposition 1 , we will 
require the following additional regularity assumption: 

max{|(log/)"(za;)|, |(log(l -/))"( 2 a:)|} < ^ (LL2) 
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for some a S ( 0 , 1 ) and for all 2 : 3 , = 0 *-a: such that/(z^;) ^ 
{0,1}. It is again easy to see that this condition is verified 
for the Independent Cascade Model and the Voter model 
for the same a G ( 0 , 1 ). 

Convex constraints The voter model is only defined 
when Oij € (0,1) for all (i,j) € E. Similarly the in¬ 
dependent cascade model is only defined when 0i j > 0 . 
Because the likelihood function Ci is equal to —00 when 
the parameters are outside of the domain of definition of 
the models, these contraints do not need to appear explic¬ 
itly in the optimization program. 

In the specific case of the voter model, the constraint 
Sj 1 "'ill *101 necessarily be verified by the es¬ 

timator obtained in (2). In some applications, the exper¬ 
imenter might not need this constraint to be verified, in 
which case the results in Section 3 still give a bound on 
the recovery error. If this constraint needs to be satisfied, 
then by Lagrangian duality, there exists a A G K such that 
adding A( ~ l) to the objective function of (2) en¬ 

forces the constraint. Then, it suffices to apply the results 
of Section 3 to the augmented objective to obtain the same 
recovery guarantees. Note that the added term is linear and 
will easily satisfy all the required regularity assumptions. 


restricted eigenvalue condition introduced by (Bickel et al., 
2009a). 

Definition 2. Let E G iSm(K) be a real symmetric matrix 
and S be a subset o/{l,..., m}. Defining C{S) = {2f G 
M*” : ||X 5 c||]^ < 3||Xs|ji}. We say that Ti satisfies the 
{S, 7 )-restricted eigenvalue condition iff: 

yX G C(5), X^EX > 7 II(RE) 


A discussion of the {S, 7 )-(RE) assumption in the context 
of generalized linear cascade models can be found in Sec¬ 
tion 3.3. In our setting we require that the (RE)-condition 
holds for the Hessian of the log-likelihood function C: it es¬ 
sentially captures the fact that the binary vectors of the set 
of active nodes (i.e the measurements) are not too collinear. 

Theorem 1. Assume the Hessian X^C{6*) satisfies the 
(S', 7 )-(RE)/or some 7 > 0 and that (LF) holds for some 
a > 0. For any S G (0,1), let 9 be the solution of (2) with 

A = 2,/^^, then: 

y an-i- 


||0-r||2< - 
7 


6 s log m 


an 


1-5 


w.p. 1 — 


1 


gn° log m 


(3) 


3. Results 

In this section, we apply the sparse recovery framework to 
analyze under which assumptions our program ( 2 ) recovers 
the true parameter 9i of the cascade model. Furthermore, 
if we can estimate 9i to a sufficiently good accuracy, it is 
then possible to recover the support of 9i by simple thresh¬ 
olding, which provides a solution to the standard Network 
Inference problem. 

We will first give results in the exactly sparse setting in 
which 9i has a support of size exactly s. We will then relax 
this sparsity constraint and give results in the stable recov¬ 
ery setting where 9i is approximately s-sparse. 

As mentioned in Section 2.3, the maximum likelihood es¬ 
timation program is decomposable. We will henceforth fo¬ 
cus on a single node i G V and omit the subscript i in the 
notations when there is no ambiguity. The recovery prob¬ 
lem is now the one of estimating a single vector 9* from a 
set T of observations. We will write n = |T|. 

3.1. Main Theorem 

In this section, we analyze the case where 9* is exactly 
sparse. We write S = supp(0*) and s = IS”!. Recall, 
that 9i is the vector of weights for all edges directed at the 
node we are solving for. In other words, S is the set of all 
nodes susceptible to influence node i, also referred to as its 
parents. Our main theorem will rely on the now standard 


Note that we have expressed the convergence rate in the 
number of measurements n, which is different from the 
number of cascades. For example, in the case of the voter 
model with horizon time T and for N cascades, we can 
expect a number of measurements proportional to N x T. 

Theorem 1 is a consequence of Theorem 1 in (Negahban 
et al., 2012 ) which gives a bound on the convergence rate 
of regularized estimators. We state their theorem in the 
context of £i regularization in Femma 2. 

Lemma 2. Let C{S) = {A G K™ | |lAs||i < 3|lAsc||i}. 
Suppose that: 

VAg C(S'), C{9* + A) - C{9*) 

-XCi9*)-A>Kc\\A\\l-Tl{9*) (4) 

for some kc > 0 and function tc- Finally suppose that 
A > 2|| V£(0*)||oo. then if9\ is the solution of ( 2 ).' 

\\k-9*\\l<^^ + ^2Tl{9*) 

To prove Theorem 1, we apply Lemma 2 with T£ = 0. 
Since C is twice differentiable and convex, assumption (4) 
with ^ is implied by the (RE)-condition. For a good 

convergence rate, we must And the smallest possible value 
of A such that A > 2||V£0*|joo- The upper bound on the 
£00 norm of V£(0*) is given by Lemma 3. 
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Lemma 3. Assume (LF) holds for some a > 0. For any 
S G (0,1); 


||v/:(r)||oo < 2 




log TO 
an^~^ 


w.p. 1 


1 

log m 


solving (2) for X = 2^^^ we have: 


\\e-e*h<- 


3 /slog TO 


7 V an 


1-5 




The proof of Lemma 3 relies crucially on Azuma- 
Hoeffding’s inequality, which allows us to handle corre¬ 
lated observations. This departs from the usual assump¬ 
tions made in sparse recovery settings, that the measure¬ 
ments are independent from one another. We now show 
how to use Theorem 1 to recover the support of 9*, that is, 
to solve the Network Inference problem. 

Corollary 1. Under the same assumptions as Theorem 1, 
let Srj = {j G {1,... ,to} : 9j > r]} for ry > 0. For 
0 < e < r], let = {i G {1,..., to} : 9* > rj + e} be 
the set of all true ‘strong’ parents. Suppose the number of 
measurements verifies: n > ■ Then with probability 

1- S*,.: C C S*. In other words we recover all 

m’ — 'I — 

‘strong’ parents and no ‘false ’ parents. 

Assuming we know a lower bound a on 0ij , Corollary 1 
can be applied to the Network Inference problem in the fol¬ 
lowing manner: pick e = f ^ “ f ’ ~ ^ 

provided that n = Q ^ ^. That is, the support of 9* 

can be found by thresholding 9 to the level p. 


As in Corollary 1, an edge recovery guarantee can be de¬ 
rived from Theorem 2 in the case of approximate sparsity. 


3.3. Restricted Eigenvalue Condition 

There exists a large class of sufficient conditions under 
which sparse recovery is achievable in the context of regu¬ 
larized estimation (van de Geer & Biihlmann, 2009). The 
restricted eigenvalue condition, introduced in (Bickel et al., 
2009b), is one of the weakest such assumption. It can be 
interpreted as a restricted form of non-degeneracy. Since 
we apply it to the Hessian of the log-likelihood function 
V^£( 0 ), it essentially reduces to a form of restricted strong 
convexity, that Lemma 2 ultimately relies on. 

Observe that the Hessian of C can be seen as a re-weighted 
Gram matrix of the observations: 






£// £ _ f/2 

^^ {9* -x^) 




( 1 -/)^ 


.{9* -x^) 


3.2. Approximate Sparsity 

In practice, exact sparsity is rarely verified. For social net¬ 
works in particular, it is more realistic to assume that each 
node has few “strong” parents’ and many “weak” parents. 
In other words, even if 9* is not exactly s-sparse, it can be 
well approximated by s-sparse vectors. 

Rather than obtaining an impossibility result, we show that 
the bounds obtained in Section 3.1 degrade gracefully in 
this setting. Formally, let G argmin||g|||^<g \\9 — 0*||i 
be the best s-approximation to 9*. Then we pay a cost pro¬ 
portional to 110 * — 111 for recovering the weights of non- 
exactly sparse vectors. This cost is simply the “tail” of 9*: 
the sum of the m — s smallest coordinates of 9*. We re¬ 
cover the results of Section 3.1 in the limit of exact spar¬ 
sity. These results are formalized in the following theorem, 
which is also a consequence of Theorem 1 in (Negahban 
et al., 2012 ). 

Theorem 2. Suppose the (RE) assumption holds for the 
Hessian V^/(0*) and tc{9*) = ^2 logm ||g*|| the fol¬ 
lowing set: 

C' ={A G KP : ||A 5 c||i < 3||As||i +4||r - ||i} 

n{||^||i<l} 

If the number of measurements n > slog to, then by 


If / and (1 — /) are c-strictly log-convex for c > 0, then 
min ((log/)", (log(l — /))") > c. This implies that the 
(S', 7 )-(RE) condition in Theorem 1 and Theorem 2 re¬ 
duces to a condition on the Gram matrix of the observations 
X^X= fox y' = y ■ c. 

(RE) with high prohahility The Generalized Linear 
Cascade model yields a probability distribution over the ob¬ 
served sets of infected nodes {x*)t^'j-. It is then natural to 
ask whether the restricted eigenvalue condition is likely to 
occur under this probabilistic model. Several recent papers 
show that large classes of correlated designs obey the re¬ 
stricted eigenvalue property with high probability (Raskutti 
et al., 2010; Rudelson & Zhou, 2013). 

The (RE)-condition has the following concentration prop¬ 
erty: if it holds for the expected Hessian matrix 

E[V^£(0*)], then it holds for the finite sample Hessian ma¬ 
trix V^£(0*) with high probability. 

Therefore, under an assumption which only involves the 
probabilistic model and not the actual observations, we can 
obtain the same conclusion as in Theorem 1: 

Proposition 1. Suppose E[V^£(0*)] verifies the (S', 7 )- 
(RE) condition and assume (LE) and (LF2). For (5 > 0, 
ifn^~^ > 28 ^®^ log TO, then V^C{9*) verifies the (S, ^)- 
(RE) condition, w.p > 1 — iogm_ 
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Observe that the number of measurements required in 
Proposition 1 is now quadratic in s. If we only keep 
the hrst measurement from each cascade, which are in¬ 
dependent, we can apply Theorem 1.8 from (Rudelson & 
Zhou, 2013), lowering the number of required cascades to 
s log m log^(s log m). 

If / and (1 — /) are strictly log-convex, then the previous 
observations show that the quantity E[V^£(0*)] in Propo¬ 
sition 1 can be replaced by the expected Gram matrix'. 
A = E[X^X]. This matrix A has a natural interpretation: 
the entry Uij is the probability that node i and node j are 
infected at the same time during a cascade. In particular, 
the diagonal term ^ is simply the probability that node i 
is infected during a cascade. 

4. A Lower Bound 

In (Netrapalli & Sanghavi, 2012), the authors explicitate 
a lower bound of f2(slog on the number of cascades 
necessary to achieve good support recovery with constant 
probability under a correlation decay assumption. In this 
section, we will consider the stable sparse recovery set¬ 
ting of Section 3.2. Our goal is to obtain an information- 
theoretic lower bound on the number of measurements nec¬ 
essary to approximately recover the parameter 0* of a cas¬ 
cade model from observed cascades. Similar lower bounds 
were obtained for sparse linear inverse problems in (Price 
& Woodruff, 2011; 2012; Ba et ah, 2011). 

Theorem 3. Let us consider a cascade model of the form 
(1) and a recovery algorithm A which takes as input n ran¬ 
dom cascade measurements and outputs 9 such that with 
probability 5 > ^ (over the measurements): 

\\9-9*h<C mm \\0-9*h (5) 

l|9||o<s 

where 0* is the true parameter of the cascade model. Then 
n = Vt{s\og'^/\ogC). 

This theorem should be contrasted with Theorem 2: up to 
an additive s log s factor, the number of measurements re¬ 
quired by our algorithm is tight. The proof of Theorem 3 
follows an approach similar to (Price & Woodruff, 2012). 
We present a sketch of the proof in the Appendix and refer 
the reader to their paper for more details. 

5. Experiments 

In this section, we validate empirically the results and as¬ 
sumptions of Section 3 for varying levels of sparsity and 
different initializations of parameters (n, m. A, Pinit), where 
Pinit is the initial probability of a node being a source node. 
We compare our algorithm to two different state-of-the-art 
algorithms: GREEDY and MLE from (Netrapalli & Sang¬ 
havi, 2012). As an extra benchmark, we also introduce 


a new algorithm LASSO, which approximates our SPARSE 
MLE algorithm. 

Experimental setup We evaluate the performance of the 
algorithms on synthetic graphs, chosen for their similarity 
to real social networks. We therefore consider a Watts- 
Strogatz graph (300 nodes, 4500 edges) (Watts & Stro- 
gatz, 1998), a Barabasi-Albert graph (300 nodes, 16200 
edges) (Albert & Barabasi, 2001), a Holme-Kim power law 
graph (200 nodes, 9772 edges) (Holme Sc Kim, 2002), and 
the recently introduced Kronecker graph (256 nodes, 10000 
edges) (Leskovec et ah, 2010). Undirected graphs are con¬ 
verted to directed graphs by doubling the edges. 

For every reported data point, we sample edge weights 
and generate n cascades from the (IC) model for n S 
{100,500,1000, 2000, 5000}. We compare for each algo¬ 
rithm the estimated graph Q with Q. The initial probability 
of a node being a source is fixed to 0.05, i.e. an average of 
15 nodes source nodes per cascades for all experiments, ex¬ 
cept for Figure (f). All edge weights are chosen uniformly 
in the interval [0.2, 0.7], except when testing for approxi¬ 
mately sparse graphs (see paragraph on robustness). Ad¬ 
justing for the variance of our experiments, all data points 
are reported with at most ail error margin. The param¬ 
eter A is chosen to be of the order 0{yJ\ogm/(an)). We 
report our results as a function of the number of cascades 
and not the number of measurements: in practice, very few 
cascades have depth greater than 3. 

Benchmarks We compare our sparse mle algorithm 
to 3 benchmarks: GREEDY and MLE from (Netrapalli & 
Sanghavi, 2012) and LASSO. The MLE algorithm is a 
maximum-likelihood estimator without fi-norm penaliza¬ 
tion. GREEDY is an iterative algorithm. We introduced the 
LASSO algorithm in our experiments to achieve faster com¬ 
putation time: 

0^ e argmm^ \f(0i ■ x*) - -f A||6»j||i 

KT 

Lasso has the merit of being both easier and faster to opti¬ 
mize numerically than the other convex-optimization based 
algorithms. It approximates the SPARSE MLE algorithm by 
making the assumption that the observations are of the 
form: x\'^^ = f(0i-x^)-\-e, where e is random white noise. 
This is not valid in theory since e depends on f(0i ■ x*), 
however the approximation is validated in practice. 

We did not benchmark against other known algorithms 
(NETRATE (Gomez-Rodriguez et ah, 2011) and EIRST 
EDGE (Abrahao et ah, 2013)) due to the discrete-time as¬ 
sumption. These algorithms also suppose a single-source 
model, whereas SPARSE MLE, MLE, and GREEDY do not. 
Learning the graph in the case of a multi-source cascade 
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(c) Holme-Kim (Prec-Recall) 



Number of Cascades 



(d) Sparse Kronecker (^ 2 -iiorm vv. n) (e) Non-sparse Kronecker (^ 2 -iionn vv. n) (f) Watts-Strogatz (FI vi. pinit) 


Figure 2. Figures (a) and (b) report the Fl-score in log scale for 2 graphs as a function of the number of cascades n: (a) Barabasi-Albert 
graph, 300 nodes, 16200 edges, (b) Watts-Strogatz graph, 300 nodes, 4500 edges. Figure (c) plots the Precision-Recall curve for various 
values of A for a Holme-Kim graph (200 nodes, 9772 edges). Figures (d) and (e) report the ^ 2 -norm ||0 — 0||2 for a Kronecker graph 
which is: (d) exactly sparse (e) non-exactly sparse, as a function of the number of cascades n. Figure (f) plots the Fl-score for the 
Watts-Strogatz graph as a function of pinit- 


model is harder (see Figure 2 (f)) but more realistic, since 
we rarely have access to “patient 0” in practice. 

Graph Estimation In the case of the LASSO, mle and 
SPARSE MLE algorithms, we construct the edges of Q : 
Uj^viih j) ■ > 0.1}, i.e by thresholding. Finally, we 

report the Fl-score= 2precision-recall/(precision-f recall), 
which considers (1) the number of true edges recovered by 
the algorithm over the total number of edges returned by 
the algorithm (precision) and (2) the number of true edges 
recovered by the algorithm over the total number of edges 
it should have recovered (recall). Over all experiments, 
SPARSE MLE achieves higher rates of precision, recall, and 
Fl-score. Interestingly, both MLE and SPARSE MLE per¬ 
form exceptionally well on the Watts-Strogatz graph. 

Quantifying robustness The previous experiments only 
considered graphs with strong edges. To test the algorithms 
in the approximately sparse case, we add sparse edges to 
the previous graphs according to a bernoulli variable of pa¬ 
rameter 1/3 for every non-edge, and drawing a weight uni¬ 
formly from [0, 0.1]. The non-sparse case is compared to 
the sparse case in Figure 2 (d)-(e) for the £2 norm show¬ 
ing that both the LASSO, followed by SPARSE MLE are the 
most robust to noise. 


6. Future Work 

Solving the Graph Inference problem with sparse recovery 
techniques opens new venues for future work. Firstly, the 
sparse recovery literature has already studied regulariza¬ 
tion patterns beyond the £i-norm, notably the thresholded 
and adaptive lasso (van de Geer et al., 2011; Zou, 2006). 
Another goal would be to obtain confidence intervals for 
our estimator, similarly to what has been obtained for the 
Lasso in the recent series of papers (Javanmard & Monta- 
nari, 2014; Zhang & Zhang, 2014). 

Finally, the linear threshold model is a commonly stud¬ 
ied diffusion process and can also be cast as a general¬ 
ized linear cascade with inverse link function z 1 —> l 2 >o: 

= sign (0j • X* ~tj)- This model therefore falls 
into the 1-bit compressed sensing framework (Boufounos 
& Baraniuk, 2008). Several recent papers study the the¬ 
oretical guarantees obtained for 1-bit compressed sensing 
with specific measurements (Gupta et al., 2010; Plan & 
Vershynin, 2014). Whilst they obtained bounds of the order 
0{s log y)> no current theory exists for recovering positive 
bounded signals from binary measurememts. This research 
direction may provide the first clues to solve the “adaptive 
learning” problem: if we are allowed to adaptively choose 
the source nodes at the beginning of each cascade, how 
much can we improve the current results? 
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7. Appendix 

In this appendix, we provide the missing proofs of Sec¬ 
tion 3 and Section 4. We also show additional experiments 
on the running time of our recovery algorithm which could 
not fit in the main part of the paper. 

7.1. Proofs of Section 3 

Proof of Lemma 1. Using the inequality Va; > 0, log a: > 
1 - we have |log(j^) - log(Y^)| > max(l - 

- ^) > max(p-p',p'-p). □ 


Proof Writing H = if VA e C{S), || E[H] - 

f^]||oo A A and E[H] verihes the (5, 7 )-(RE) condition 
then: 

VA e C{S), AHA > AE[i7]A(l - 32sA/7) (6) 

Indeed, \ A{H-E[H])A\ < 2A||A||? < 2A(4y^||Asllz)^. 
Writing df jjC(0*) = SteT and using {LF) and 
{LF2) we have \Yt — E[y(]| < Applying Azuma’s 
inequality as in the proof of Lemma 3, this implies: 

m I 


I E[H] - H\\oo > A] < 2exp---h 21og 


naX^ 


Proof of Lemma 3. The gradient of £ is given by: 




teT 


^t+l f (n* . ^t\ 


f 


{0* ■ X*) 


r 


-{l-xl+^)^^{0*-x*) 


Let dj£{0) be the j-th coordinate of V£(0*). Writing 
dj£{0*) = |p=| Ft and since E[a:-'''^|a:*] = f{0*-x*), 

we have that E[y(_|_i iFj] = 0. Hence Zt = X)fc=i Afe is a 
martingale. 

Using assumption (LF), we have almost surely \Zt+i — 
Zt\ < A and we can apply Azuma’s inequality to Zt'. 


P [\Zr \ > A] < 2 exp 

Applying a union bound to have the previous inequality 
hold for all coordinates of V£{0) implies: 

P [||V/:(6»*)||oo > A] < 2mexp ^ 

Choosing A = concludes the proof. □ 

Proof of Corollary 1. By choosing <5 = 0, if n > , 

then \\0 — 0*\\2 < e < r] with probability 1 — i. If 0* = 0 
and 0 > r], then ||0 — 0*\\2 > \0i — 0*\ > p, which is 
a contradiction. Therefore we get no false positives. If 
0* > 7] + e, then \0i — 0*\ < e 0j > p and we get all 
strong parents. □ 


(RE) with high probability We now prove Proposi¬ 
tion 1. The proof mostly relies on showing that the Hessian 
of likelihood function C is sufficiently well concentrated 
around its expectation. 


Thus, if we take A = \\E[H] - H\\^ < A w.p 

at least 1 — e“" logm^ When > ^g^s^logm, (6) 
implies VA £ C{S), AHA > AAE[iJ]A, w.p. at least 
1 — e“" logm and the conclusion of Proposition 1 follows. 

□ 


7.2. Proof of Theorem 3 

Let us consider an algorithm A which verihes the recovery 
guarantee of Theorem 3: there exists a probability distri¬ 
bution over measurements such that for all vectors 0*, (5) 
holds w.p. S. This implies by the probabilistic method that 
for all distribution D over vectors 0, there exists an n x m 
measurement matrix Xjj with such that (5) holds w.p. S (0 
is now the random variable). 

Consider the following distribution D: choose S uni¬ 
formly at random from a “well-chosen” set of s-sparse 
supports F and t uniformly at random from X = £ 

{ —1, 0,1}™ I supp(f) £ Fj. Dehne 0 = t + w where 
w Af(0,a^lm) and a = 0(A). 

Consider the following communication game between Al¬ 
ice and Bob: (7) Alice sends y £ drawn from a 
Bemouilli distribution of parameter f{Xjj0) to Bob. (2) 
Bob uses A to recover 0 from y. It can be shown that at 
the end of the game Bob now has a quantity of information 
0(s log about S. By the Shannon-Hartley theorem, this 
information is also upper-bounded by 0{n\ogC). These 
two bounds together imply the theorem. 

7.3. Running Time Analysis 

We include here a running time analysis of our algorithm. 
In Figure 3, we compared our algorithm to the benchmark 
algorithms for increasing values of the number of nodes. 
In Figure 4, we compared our algorithm to the benchmarks 
for a hxed graph but for increasing number of observed 
cascades. 

In both Figures, unsurprisingly, the simple greedy algo¬ 
rithm is the fastest. Even though both the MLE algorithm 
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Figure 3. Running time analysis for estimating the parents of a 
single node on a Barabasi-Albert graph as a function of the num¬ 
ber of nodes in the graph. The parameter k (number of nodes each 
new node is attached to) was set to 30. Pinit is chosen equal to .15, 
and the edge weights are chosen uniformly at random in [.2, .7]. 
The penalization parameter A is chosen equal to .1. 


and the algorithm we introduced are based on convex op¬ 
timization, the MLE algorithm is faster. This is due to the 
overhead caused by the fi-regularisation in (2). 

The dependency of the running time on the number of cas¬ 
cades increases is linear, as expected. The slope is largest 
for our algorithm, which is again caused by the overhead 
induced by the fi-regularization. 



Figure 4. Running time analysis for estimating the parents of a 
single node on a Barabasi-Albert graph as a function of the num¬ 
ber of total observed cascades. The parameters defining the graph 
were set as in Figure 3. 

















